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Tempering is used to change the quark mass while remaining in equilibrium be¬ 
tween the trajectories of a standard hybrid Monte Carlo simulation of four flavours 
of staggered fermions. The algorithm is faster for small enough quark masses, and 
particularly so when more than one mass is required. 


1 Introduction 


The standard-algorithms used for simulations of full QCD are slow, especially 
for topology El. It is not yet possible to generate an ensemble of full QCD 
configurations which samples the global topological charge adequately. 

Simulated tempering Era, which traditionally promotes a parameter to a 
dynamic variable that changes during the simulation, has been successfully 
implemented in (3 for spin glasses. We will promote the quark mass, and change 
it between the trajectories of a standard hybrid Monte Carlo algorithmic for 
four flavours of staggered fermions. 

The choice of a new mass is made with a Metropolis step, and insures that 
the change is only made if the configuration is also part of the equilibrium 
distribution of the new mass. So unlike other similar algorithms there is no 
need to ‘fix’ the distribution at the end. 

The relevant measure of the speed of an algorithm is the integrated auto- 
correlationta, obtained from the auto-correlation function C°(t) for an observ¬ 
able O. The integrated auto-correlation time for O is defined to be 
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C°(r) 


(1) 


where <J max and <7 m i n are the maximum and minimum variances under blocking. 
With too little data, IVdata < 1000T; nt , an accurate value for ri nt cannot be 
obtained. An estimator is the rightmost expression in I)- 

“Talk presented at the conference Multi-Scale Phenomena and their Simulation from 
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The largest value for r? t over the set of all observables O defines the 
slowest mode, and hence the number of independent measurements. For full 
QCD the global topological charge Q seems to be the slowest observablc|. 


2 Simulated Tempering 

The quark mass m q becomes a dynamic variable, and takes a new value for 
each trajectory from a discrete, ordered set with N m elements, [m m i n ,..., m max ]. 
This can also be seen as ranging from ‘slow’ to ‘fast’. The only requirement 
is that the action histograms of neighbouring masses overlap. The simulation 
performs a random walk of length TV ^ between the lowest and highest masses. 

The probability distribution now simulated is P(U, (f>, i ), the probability of 
having the configuration with the set of gauge fields U and (pseudo-)fermion 
fields 4> generated at quark mass m^. For each i it is the same as the original 
distribution P(U , </>), of course. This is given by 


P(U, 4>, mi) oc exp[-S(U, <)>, (3, + g { ] (2) 

where the gi are pre-determined constants governing the probability P(i) of 
generating configurations at mass nii. This distribution is simulated using 
your favourite algorithm for fixed quark mass (here HMC), combined with 
the simulated tempering Metropolis steps to change from mi to rrii±i. The 
hybrid Monte Carlo algorithm insures that the correct Gibbs distribution is 
generated at each value of the mass, and the ST Metropolis step insures that 
the configuration is also an equilibrium configuration at the new mass. 

The constants gi can be fixed by choosing, for example, to visit each mass 
with equal probability, P(i) = 1/N m . Then gi = — In Zi, ie. the original free 
energy at fixed mass mi. P{i) is arbitrary, and can be optimized for speed. 
The simulation only needs gi+i — gi, though, estimated from 

A.g = 9i+ 1 - 9i = VSm - ( \)V(5m ) 2 + 0({5m) 3 ) (3) 

where (ipip) and (%) are the chiral condensate and susceptibility. The require¬ 
ment of overlapping histograms implies that 5m satisfies 

5m ~ l/\/ (x)V ~ m a /W. (4) 

b It may be that the topological charge density alone is relevant for most observables in 
QCD. If so, slow evolution of the global topological charge may not be such a problem. How¬ 
ever, this question can only be clarified once the topological sector itself has been adequately 
explored. 
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Table 1: Results for single mass HMC runs at m = 0.01 and 0.02, and simulated tempering 
between 0.01 and 0.02, and between 0.02 and 0.03. All times are in seconds. The number in 
brackets indicates the time for 80 random vectors needed to determine A g. The values for 
Ant f° r the ST runs are allocated correctly! So more statistics are needed! 


m 


Congrad 

ratio 

Time 
per traj. 

ST 

overhead 

Atraj 

' int 

Block 

length 


0.01 


660 


420 

45(3) 

105 


0.02 


325 


450 

28(2) 

112 

ST 0.01 

- 0.02 

2.1 

575 

15(240) 

744 

30(4) 

186 

ST 0.02 

- 0.03 

1.5 

280 

10(160) 

1150 

48(4) 

105 


In a short preliminary run new values for A g which yield a flatter distribution 
can be obtained by balancing the energy difference for the changes i —> i + 1 
and i + 1 —> i. 

The overhead depends largely on N m , which itself depends on the step size 
5m. As the susceptibility is related to the scalar meson mass, x = Ar/m^, 
the step size is large in the chiral limit (at zero temperature, but sadly not 
near the chiral transition). Hence simulated tempering becomes dramatically 
more effective for very small quark masses, with the gain in speed more than 
compensating for the N‘? n cost of having additional masses. 

3 Results 

We have two runs on 16 3 x 24 lattices, one with six evenly spaced masses in the 
range 0.01 - 0.02, the other with nine evenly spaced masses in the range 0.02 

0.03. The results obtained indicate that seven is the optimal number. The 
initial estimates of A g from (||) were made using data for the chiral condensate 
and susceptibility from the HMC runs. These proved adequate, but were tuned 
twice to bring P(i) closer to 1 /TV. 

We use two hybrid Monte Carlo trajectorietj] at fixed quark mass, and then 
a Metropolis step to change the mass. Trajectories are of length r = 0.6 with 
step sizes St = 0.005 (m q = 0.02 and ST 0.02 — 0.03) and St = 0.004 (m q = 
0.01 and ST 0.01 - 0.02). 

For four staggered fermions one only needs one extra inversion per ST 
step for the transition from the set of variables {17, <j>, i} to {17, + This 

introduces negligible overhead. However, for the runs here we have used 160 

c This is conservative, but it insures that a configuration is seldom used for a second ST 
step. 
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Mass 




Trajectory No. 

Figure 1: The time history of the mass mi Figure 2: The time history of Q from sim- 

for the trajectory in the simulated tempering ulated tempering 0.01—0.02 and pure HMC 

run 0.01—0.02. at 0.02 (Q — 5) and 0.01 (Q + 5). 


inversions (80 Gaussian noise vectors) to eliminate the {</>, 0} spread, in order 
to estimate the A gi with as few trajectories as possible. This is not strictly 
correct for simulating four staggered fermions, where P(U , (/), <f>) should be simu¬ 
lated. This overhead will be larger for algorithms needing the full determinant, 
although new methods for estimating the determinant □ may help. 

The time history of the mass rrii is shown in figure [I] for the case 0.01— 
0.02. The estimates for A g require one further adjustment, as there is still 
a bias towards the low values of the mass. Even with relatively little tuning, 
though, the algorithm visits all masses in the set. 

The time history of the global topological charge, obtained as in0, is shown 
in figure |^. There may be more movement in the topology for the simulated 
tempering run than there is at fixed m = 0.01, and less at m = 0.02. 

In table [I] the simulated tempering runs are compared with standard HMC 
runs. The column CG ratio indicates the purely algorithmic potential for 
acceleration, the ratio of the total number of conjugate gradient iterations 
needed for a complete trajectory at the smallest and largest masses of the set. 

We extract T; nt using ([I]) with blocking, as we do not have anywhere near 
the statistics needed for an accurate estimate of Tj nt . The true values may 
well be much larger, so these should only be taken as an estimate of the lower 
bound. The error estimate comes from a jack-knife analysis. 

From the numbers for Tint it seems plausible that the physics moves with 
at least the same speed in ST, and perhaps a little faster if one is optimistic. 
So less time per trajectory does mean less time for decorrelating observables. 

The actual algorithmic acceleration can be seen in the next two columns. 
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Since the step size was kept constant for all masses in a set, the gain is lower 
than in column CG ratio. If only the lowest mass is needed, simulated temper¬ 
ing brings little for the range 0.02—0.03, but is promising for 0.01—0.02. It 
would be faster, probably about 400s rather than 575s were the step size also 
changed with the mass, and using the final numbers for A g. 

Of course, the values obtained via simulated tempering for observables like 
the plaquette, chiral condensate etc. agree with those from standard runs. 


4 Conclusions 

Simulated tempering does speed up full QCD simulations. It is especially useful 
when going to very small masses, such as m q < 0.01 for staggered fermions, as 
the step size goes to a constant at zero mass. It is also more useful if results 
from more than one mass are required. It may also be applied to other fermion 
types, eg. Wilson fermions via k. 

The conservative choice of parameters used here do not exhaust the po¬ 
tential for acceleration, and leave much room for improvement. Longer trajec¬ 
tories, or one trajectory (rather than the conservative two here) between sim¬ 
ulated tempering steps will also reduce the overhead. Another improvement is 
to run simultaneously at .each mass in the set, and then swap configurations 
between adjacent masses El, which implements the parallel tempering method 
for fermions. This method requires considerably more memory, though, and 
cannot be used if memory rather than speed limits the largest lattice used in a 
simulation. Parallel tempering is under investigation, as well as an implemen¬ 
tation with Wilson fermions. 
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